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= (54) Titie: METHOD FOR THE READILY COGNTITVE DISPLAY OF STRUCTURE-PROPERTY DATA 




(57) Abstract: The method uses agglomerative hierarchical clustering to organize data on the basis of 2D chemical structure sim- 
C9 ilarity across a pre-defined profile of biological assay values and related property values for a plurality of chemical compounds. 
The resultant hierarchical clusters aie visualized as both a treemap and heatmap providing simultaneous representations of cluster 
mooibeis along with their related property values. The heatmap and tree-map are linked and visually integrated. The simultaneous . 
^ display and integration of the tieem^ and heatmap provides dynamic readily cognitive, multidimensional visualization of the stnic- 
1^ ture-property data. 
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MFraOD FOR THE READILY COGNTITVE DISPLAY OF STRUCTURE-PROPERTY DATA 

This Invention relates to the cognitive display of object and related subject data. This invention 
also relates to the readily cognitive display of structure property data for a plurality of chemical 
compounds. 

BACKGROUND OF THE INVENTION 

The substantial Increase In chemical structure and biological activity data brought about 
through combinatonal chemistry and high-throughput screening (HTS) technologies has created the 
need for sophisticated graphical tools for vteuaJizing and analyzing such data. Visualization of the 
structure-activity data is important in analyzing and understanding relationships within multi 
dimensionai data s^ The prior art chemoinfonfnatics sofftwam applications apply standard clustering 
techniques to organize chemical compounds on the basis of 2D or 3D struchirai features, or on the 
basis of a biological activity associated with the chemical compounds. 

The development of software for visualizing multidimensional structure activity data is a 
significant challenge. Medicinal chemists require that such software be intuitive, provide tools to 
Interact with both chemical structure and biological data, and support the organization and visualization 
of information-rich data. A number of software tools have been developed during the past 
approximately ten years to help chemists understand relationsh^ b^een chemical structure and 
related activity data. 

Navigator, as disclosed In Chapman, D.; Hams. N.; Parte, J.; Critehlow. R. E. Jr. Navigator 
Tools for informal structure-actWity relationship discovery. J. Molecular Graphics 1995, 1 3, 242-249, 
was developed as q molecular database visualization tool in about 1995. Navigator relies on a maximal 
common subgraph algorithm to determlrie neighboring relationships among chemical structures. This 
approach to data organizatidn is intuitive to most chemists, because it fedlitates comparisons between 
compounds. Two conripounds are considered rented If more than half their structure is Identical and if 
one molecule can be transformed into the other by breaking a single bond and repfadng the substituent 
at this position. 

VisuafiSAR, as disclosed in wnd. D. J.; Biankley, 0. J. VIsualiSAR: A Webbased application for 
clustering, structure browsing, and structure-activity relationship study. J. Molecular Graphics 1999, 17, 
85-89, is a web4)ased program that employs modal fingerprints along with Stigmata coloring of atoms 
to highlight common and unique structural featores among compounds at various levels wiihin a 
hierarchical cluster. VIsualiSAR employs DayfigW fingerprints, as disdosed In Cosgrove, D. A.; Willetl, 
P. SLASH: A program for analyzing the functional groups In molecules. J. Molecular Graphics 1998, 
16, 19-32, and Wanfs, as disclosed in Ward, J. H. Hierarchkal Grouping to Optimize an Objective 
Function. J. Amer. Stat Assoc. 1963, '58, 236-244. disclose clustering to organize chemical structures 
on the basis of their Taninroto similarity. In addition, that software provides navigation tools for 
selecting among the various levels of the duster hierarchy and displaying the chemical structures of 
cluster members. While VisualiSAR provides a useful means for visualizing chemical structures within 
specific dusters and cluster levels. It suffers from a common problem assodated with prior art visual 
representation and navigation of hierarchical data. That is, VisualiSAR does not adequately convey the 
spatial relationships between dusters and dus^ membere among the various levels of the hierarchy. 

SUBSTITUTE SHEET (RULE 26) 
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An alternative to hferarchfcal aggkmnerative dustering is Optimizable KDissimiteirity Selection 
(OptiS^), as disclosed in Clarti, R. D. OptiSfrn: An Extended Disslmflailty Selection Method for Finding 
Diverse Representative Sut)sets. J. Chem. Inf. Comput Sd. 1997, 37, 1181-1 188, which Tripos. Inc. 
employs the SARNavlgator, as disclosed in SARNavigator is available from Tripos, Ina 
hl^:llwww.tripos,coml, product SARNavigator presents stnictureactMty data In a "landscape view," 
wherein structurally similar compounds or dusters are plotted as drdes of varying size within the 
central region of the landscape and dissimilar compounds are placed along the perimeter. The size of a 
drde represents the twundary of the cluster in structure space, and the dides are colored to 
oorrespoTKj with specific acth^ data. While the "landscape view* Is effiectlve at providing a unifying 
visualization of structure and activity data, the relationship t)etween compounds plotted in the central 
region of the landscape and those along the perimeter is effectively lost 

An alternative approach to analyzing strudure-actMty data fiocuses on identifying 5ut)structures 
that conrelate with activity. The program SLASH, as disclosed In Cosgrove. D. A,; Wlll^ P. SLASH: A 
program fix analyzing the functional groups in mdecuies. J. Molecular Graphics 1998. 1 6, 19-32, 
generates a set of functional groups from an input file, and then analyzes the distritxition of these 
groups among the active compounds in the Input data. The program LeadScope, as disdosed in 
Rol)erts, G.; Myatl, G. J.; Johnson, W. P.; Cross, K. P.; Blower. P. E. Jr. LeadScope: Software for 
Exploring Large Sets of Screening Data. J. Chem. Inf. Comput Sd. 2000,40, 1302-1 314, keeps track 
of the numt>er of compounds that possess specific lunctkxial groups,- aromaUcs and heterocydes. 
Users of SLASH may exdude structures from consideratfon by setting limits on the range of specific 
stmcture (e.g., molecular weigfrt, togP, and numt>er of rotatable bonds) and activity data. LeadScope 
relies heavily on the use of histograms, and scatter-plots, neither of which are well suited to visualizing 
SAR. 

The challenge In presenting multkJimenstonal, chemteal structure-activity data lies in the 
mapping of this data onto a two- or three-dimensional space. A common approach to redudng the 
dimensionality of a data set is non-linear mapping, and this is usually achieved through principal- 
component anal}^ (PCA), or multklimensional scaling (MDS). An alternative to PGA and MDS is the 
use of Kohonen neural neh¥ort(S to construd SelNirganizing Maps, as disdosed in Kohonen, T. in Self- 
Oiganizing Maps, 2"d Edition, Springer-Veriag, Beriin, 1997. Gedeck and WiDett, as disdosed in 
Gedeck, P.; Will^ P. Visual and computational analysis of structure-acQvIfy relationships in high- 
throughput screening data. Curr. Opin. Chem. Bio. 2001, 6(4), 389-395, elaborates on ttie application of 
ttiese techniques to visualizing structure-activity relationships (SARS) In high4hroughput screening data 
in a recent review artide. When applying non-linear mapping to structure-activity data, there is always a 
ti:adeK>fr between choosing structure, or adh% as tiie primary means of representing ttie data. 
Organizing stnidure-activity data on ti)e basis of chemical structure often interferes witti ttie 
presentation of the conresponding activity data. Similariy, multidimensional activity data represented in 
2D, or 3D plots makes it difiicult for a chemist to grasp underlying conreiations between- chemical 
structure and acti\4ty. Data visualization programs, such as SpotFIre, as disdosed in Spoti^re 
Decision site is available from SpotRre, Ina http://www.spotfire.com/, are effective in tiielr approach to 
representing multidimensional activity data in two or three dimensions. However, SpotFire does not 
manage chemteal structures natively, nor does it provide dynamic visualization of hierarchical dusters. 
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Hterarchical dusters generally are represented as dendrogFams, which can be difficult to 
navigate, especially when applied to large data sets, in about 1992. B. Shnelderman. as cfisdosed in 
Shneidennan, B. Tree visualization with Tree-maps: a 2-d space-fflltng approach ACM Trans. Graphics 
1 992, 1 1, 92-99, created tree-maps as an alternate visualization for large, hierarchical data structures. 
5 A tree-map is a 2D space-filling approach in which each leaf of a tree is represented as a rectangle 
whose size and fill color oonespond with specific attrtoutes in the data being represented. The tree-map 
algorithm is also applied to depict hierarctucal computer file systems in an efficient manner. Recently, 
the tree-map algorithm was applied to aid the visualization of gene expression data within the context of 
the gene ontology, as disclosed in Baehrecke, E. H.; Dang, N.; Babaria, K.; Shneidenman B. 
10 Visualization and analysis of microanray and gene ontology data with treemaps. BMC Bioinformatics 
2004, 5(1), 84-96. 

A heatmap is used to visucdize gene expression data of certain drug methodology enzymes in 
rat livers after treatment, as disclosed in Naoki Kiyosawa, Toshiyu kl Watanabe, Kyoko Sakuma, Miyuki 
Kanbori, Fu kurol, Shizuoka. Ltd., Japan, Toxteology Letters 2003. 145(3), 281-289. 

1 5 Data mining Is generally understood to be a process that uses computerized data analysis tools 

to provide data patterns and relationships that are useful to draw conclusions. The objective of data 
mining is to produce, from given data, some new knowledge or insight Data mining generally relies on 
a large number of databases, with resultant large numbm of indices and files. Such large numbers of 
indtees and fOes are difficult to view and nawgate, and are not readily visually cognizable. 

20 The diverse arts including, by way of examples, the data mining and chemoinformatics arts, 

desire a dynamic and readily cognitive visuallzatk>n of large volumes data. The chemoinformatics art 
desires a method and system for the readily cognitive multidimensional visualization of data, particularly 
Including structure and related property data for chemteal compounds. The present Invention provkies 
a solution to these diverse arts desired needs. 

25 SUMMARY OF THE INVEMTION 

The present invention is a visualization tool for hierarchlcaily structureble data, whteh prevkies 
a readily cognitive display of tfie data. In one principal aspect, the present Invention is a method for the 
readily cognitive display of data for a plurality of subjects and tiieir related object, whteh method 
includes: displaying the data In a tree-map, displaying the data In a heatmap, and Integrating the tree- 

30 map and heatmap. whereby there is a readily cognitive display of the data. The invention employs a 
redprocal nearest neighbor (RNN) algorithm to organize ttie data into hierarchteal dusters and sub- 
dusters. 

In anottier prindpal aspect, the present invention is a method for the readily cognitive visual 
display of the structure-property data of a plurality of compounds. In this aspect of the invention the 
35 method indudes: displaying the structure-property data in a tree-map. displaying the structure-property 
data In a heatmapr and Integniting the tree-m^ and heatmap, whereby there Is a readily cognitive 
multidimenstonal display of the chemical structure-property data. 

In yet anottier aspect, the present invention is a metiiod for the readily cognitive display of the 
structure-property data of a plurality of compounds which method indudes. organizing the structure- 
40 property data by agglomerative hierarchical dusteririg, and displaying the structure-property data in a 
tree-map, wherein tfiere is a readily cognitive multidimensional display of the structureproperty data. 
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In stn) another aspect, the present invention a method for the readily cognitive display of the 
structure-property data of a pluF^lty of compounds comprising: organizing the structure-property data 
by agglomeratlve hierarcMcal dustermg; and displaying the structur&-property data in a heatmap; 
whereby there is a readily cognitive multidimensionai display of the structure-property data. 

The tree-map has a plurality of defined regions (e.g. rectangles), and the heatmap has a 
pluranty of cells. Each heatmap cell is a respective row and column Intersection. There is a 1 : I 
correspondence between a specific heatmap cell and a specific tree-map region. The tree-map and 
tieatmap ara integrated, whereby the user readily mouse linics and riavigates t>etween tiie heatmap and 
tree-map displays. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1A Is a representation of a hierarchical cluster of ten compounds shown as a tree-map. 

F^ure 1 B Is a representation of a hierarchicai duster of ten compounds shown as a heatmap. 

Figure 2 is a treenmap of 1883 compounds firom the NCI GIsO diversity data set, with tt)e 
compounds dustered according to the similarity between their 2D chemical fingerprints, and the 
rectangles of the tree-map colored according to each compound's GIsO in the OVCAR- 3 cell line. 

Rgure 3 is a tree-map of 1883 compounds from the NCI diversity data set, vWth the compounds 
dustered on the basis of their GIsO profile across the OVGAR-3, -4. -5, and -8 cell lines, and the 
rectangles of the tree-map shaded according to each compound's In the OVCAR-3 cell line. 

Figure 4 is heatmap representation of the dustered profile from Figure 3, with the compounds 
with Mgh values across the OVCAR-3, -4, -5, and -6 cell lines visible in the upper half of the heatmap, 
and wherein a portion of this region of the heatmap has t>een selected, with the corresponding 
structures for tiiese compounds shown in the dialog beiaw the heatmap. 

Figure 5 is a heatmap of the NCTRER estrogen receptor bindlr^ data set, showing the 
compounds in the data set dustered according to the similarity l)etween their 20 chemical fingerprints, 
wherein a relationship between structure and activity is readily oooperativety apparent from the overlap 
of acth/ity category and assigned chemical dass for the most active estrogen, receptor binding 
compounds in the data set 

Rgure 6 is a tree-map of the most active estrogen receptor binding compounds in the NCTRER 
data set show^g the compounds dustered according to the similarfty between their 2D chemical 
fingerprints, wherein the compounds tend to duster according to their assigned chemical dass as 
illustrated by the annotations added to the tree-map. 

F^ure 7A shows the structures of genistein and three related phytoestrogen Isoflavones of . 
Figures. . 

Figure 7B shows the structures of fr>ur steroids possessing aromatic A-rings selected from the 
tree-map of Figure 6. 

Rgure 8 is a Schneidemian prior art of the entire fly genome in the context of the gene 
ontology. 

Rgure 9 is a selection of the enzyme-activity region of the molecular-fiinction region from tiie 
tree-map of Figure 8. 

Figure 10 is a selection of the peptidase-actlvity region from Figure 9 showing which genes of 
the fly genome arp associated with various peptidase enzymes, wherein the size and color of the 
rectangles displayed are based on the parameters set In the legend panel to the right of the tree-map. 
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Rgure 11 is a treenmap of 296 compounds obtained from a patent search of GABAnA a5 
agonists, which treenmap shc^ a clustering of the compounds based on the similarity of their chemical 
structures, and wherein the colors of the rectangles in the tree-^nap represent the respective assignees 
of the corresponding patented compound. 

5 

DESCRIPTION OF THE INVENTION 

Definitions 

The term "property used hereinbefDre and hereinafter means a physical parameter, chemical 
paran)eter, physico-chemical parameter, biological parameter, clinical InfDrmation, patent Information. 

10 and other information, as well as biological activity (the term "actrvit/' being afforded Its well-understood 
meaning In the art). The following without limitation are examples of a property within the meaning of 
the present invention: biological (e.g., ^Inhibition, %actlvity, IC-60), chemical dass, therapeutic target 
physicochemical (e.g., log P, log D, pKa, polar surface area, cfipole moment), patent information (e.g., 
patent assignee, patent issue date, inventor), and dinicat infonrnation (e.g. dose, indication, side effect 

15 data). 

The term "patent* as used herein in connection with describing and dalming the invention 
tMnoadly contemplates and means e.g. issued patents, published patent applications, statutory invention 
requests, disclosure documents, interfierence tilings, reexamination filings, protests, defensive 
publications, and like U.S. and foreign documents. 
20 The present invention, however, in addition to patents as defined hereinabove, broadly 

contemplates any information or informational materials, and by way of example without limitation 
indudes: published or pubHdy availat>le infoonation (e.g. sdentiftc. technical, medical, chemical and 
phamnaceuticai Journals, articles therein and abstracts of the artides), legal publications (e.g. case law, 
law Journals), not generally speaking publicly available Information such as company or business 
25 documents, (e.g. research reports, correspondence) and the like. 

The term "SAR" as used herein Is a welkinderstood term in the chemolnformatics art and 
refers to chemical structure-activity relatk>hship. 

The term "SPR" as used herein is a coined term that refers to structure propertyrelationship, 
wherein the term "property" is as previously defined. 
30 The tenm "recursive" as used her^n is a coined term derived from the recursion process of 

defining an object in terms of itself, as generally discussed by Kenneth H. Rosen, in "Discrete 
Mathematics and Its Appllcattons," 4*^ Ed. 1999, pp. 202-219. • 

Tree-Mao and Heatmap - 

The tree-map algorithm traverses the branches of a hierarchical cluster recursively beginning 

35 with the root node. The tree-map algorithm begins with a defined regton (rectangle) corresponding to . 
the root of the hierarchteal cluster. As each branch is visited, a rectangular regk>n of the tree-map is 
split evenly along altemating vertical and horizontal centers. Upon reaching a tenninal node in the 
hiererchical duster, the connesponding rectangular region of the tree-map is associated with the 
tenminal node of the duster. Consequently, each rectangular region of the tree-map is uniquely 

40 assodated with a single node in the duster. An example of a tree-map generated from a hierarchical 
duster of ten objects Is shown in F^ure 1 (A). The perimeter of the tree-map connesponds to the root of 
the hierarchical duster shown on the left skf e of Figure 1 (A). The root node is split into a left and right 
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branch, and this is reflected in the tree-map by dividing the rectangular region vertically into two equal 
hah^. The left half of ttie treemap corresponds to the left branch from the root of the duster, and tfie 
right half of the tree-map corresponds to the right branch. The left branch of the duster also possesses 
a left and right branch. The left half of the tree-map Is split horizontally Into two equal halves. The 
upper half represents the left sub4)ranch, while the lower half corresponds to the right sub-branch. 
Continuing with the left sub-branch In the duster, the upper quarter of the tree-map is split vertically, 
and the left half of this rectangular region Is assigned to node 1 1n the duster. Traversing the remaining 
branches of the hierarchical duster results in the treemap shown on the r^ht side of Figure 1 (A). Each 
rectangular region of the treemap is identified by a distinguishing color corresponding to the value of a 
property (e.g., biological assay) associated with the chemical structure of the corresponding node in the 
duster hierarchy. Ttie following pseudo-code illustrates one possible implementation of the tree-map 
algorithm. 

A Node structure retains information related to agglomerative himrchical dustering 
(compoundlD, leftchild. and rlgtitchild) and the coordinates of its rectangular region in ttte tree-map. 
strudNode 
int compoundlD; 
Red redang ularReg ion; 
Node leftchild; 
Noderightchlld; 

void Cre9teTreemap(N0de node, Int left, Int top, int right, iht bottom, bool 
axis) 

if (node.leflChild = = 0 and node.rightChild = = 0) 
//this is a terminal sub-duster 
node.redangularf^ion = Rect(lefl, top. right, bottom); 
else If (axis) 

//Divide rectangular legion vertically 

CreateTreemap(node.leflChlld, left, top, lefl+(right-left)/2, bottom, laxis); 
CreateTreemap(node.rightChi]d, left'Kright-left)/2, top, right, bottom, laxis); 
else 

//Divide rectangular region horizontally 

CreateTreemap(node.leftChDd, left, top, right, top+(bottom-tDp)/2, 

laxis); 

CreateTreemap(node.rightChild, left, top+(bottom-top)l2, right, 
bottom, laxis); 

. The command "CreateTreemap(root, 0, 0. 200, 200, true)" creates a treemap 200 units wide by 
200 units high, and begins with a verb*cal division of this bounded region. 

In the aforesaid manner, the tree-map provides immediate insight into the spatial relationship 
between items and among sut>-dusters of the hierarchical duster. The size of a rectangle in the tree- 
map conrelates with the depth of the conesponding node in the hierarchical duster. Compounds 
present at the same duster level are depicted as rectangular regions of equal size In the tree-map. 
Compounds within a sub-duster are depicted by a smaller tree-map bounded by a rectangular region 
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within the main tree-map. A clustered set of structurally related and diverse compounds results In a 
tree-map characterized t^y regions of densely packed rectangles Interspersed with wore sparse 
regions. The sparse regions of a tree^ap correspond to compounds t>elonglng to sutMdusters that lie 
doser to the root of the hierarchical duster. 
5 A heatmap depicts a hierardiical duster of items along its y-axis togetlier with a hierarchical 

dustering of property data along its x-axis. An example of a heatmap representation of a duster of ten 
compounds across five properties is shown in Rgure 1 (B). The left most tenmlnal node In the duster of 
compounds corresponds to the first row of ttie heatmap, and the right most terminal node corresponds 
to the last row. Likewto, the left most Rem In the duster of properties corresponds to the first oolunrm 

10 of the heatmap, and the right most item in ttie duster corresponds to the last column. At each row- 
column intersection, a rectangle is drawn and shaded to represent the value corresponding to that 
particular oompourtd-property pair. Each rectangle of the heatmap Is the same size. 

It has been found that a heatmap and corresponding treenmap provide complementary 
visualizations of dustered structure-actMty data. The tree-map conveys both the topology of the 

IS corresponding hterarchical duster and secondary information assodated with each item of the duster. 
In addition, the spaoe-fiHing characteristics of the tree-map algorithm enable every sub-duster within a 
dendrogram to be viewed simultaneously. For example, there are four sut>-dusters at a depth of 2 from 
the root in the dendrogram of Figure 1 (A). These four dusters are represented by the four quadrants 
of the corresponding tree-map: the upper-left quadnant is duster (1, 2, 3); the lower4eft quadrant is 

20 duster (4, 5); the upper-rfght quadrant Is cluster (6» 7, 8, 9); and the lowernlght quadrant is cluster (10). 
A tree-map is limited to depicting only one property assodated with Items of the duster. In contrast, a 
heatmap provides a visualization of duster nodes across multiple property values. A heatmap, 
however, does rK>t depid ttie hierarchy that exists between nodes within the duster. When applied to a 
common hierarchical duster of data, a tree-map may be regarded as a 'more detailed representation of 

25 a columnar cross-sectbn of a heatmap. 
MPX System 

The present system is othenvise referred to herein as "IMOLECULAR PROPERTY 
EXPLORER" or "MPX*. The MPX graphical user interl^ace consists of four major components (see e.g. 
Figure 2). The menu bar provides access to commands for opening a data set, modifying the graphical 

30 representation of the data, partitioning the data into smaller subsets on the basis of property or 
cherrdcal structure criteria, and accessing on-line help. Below the menu bar is a tool bar that contains 
buttons for scaling the display region, toggling t)etweeh heatmap and tree-map visualizations, dustering 
ihe data set, adding compounds to the data set, searching tor compounds by name or by structure, and 
cycling through the display of property data over a predefined animation interval. The heatmapltree- 

35 map display region occupies most of the application window. The tree-niap visualization is annotated 
witti a virtual map grid witt) letters along the left and numbers along the top edge of the tree-map. This 
grid provides visual reference to the absolute location of a zoomed region of the tree-map. To the right' 
of the tree-map Is a legend that defines the color assigned to each rectangle of the tree-map over a 
linear range for the selected property. A list of properties read from the date set is displayed to the left 

40 of the display region. Single or multiple items may be seleded from the list, and ttie software 
automatically updates the display region to reflect ttie cunentiy selected property. When multiple 
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properties are selected from the property list, the slider below this list becomes active and may be used 
to choose amongst the set of selected properties (Figure 2). 

Chemical structures must first be organized into a hierarchical cluster prior to visualization as a 
tree-map, or heatmap. MPX clusters a data set on the basis of 2D chemical structure, or a set of related 
S properties defining a profile. When clustering by chemical structure, MPX relies on the Accord 
Chemistry SDK to generate 2D fingerprints firom chemical structures. The Accord Chemistry SDK uses 
an approach similar to the Daylight method of computing fingerprints from 2D structures (as previously 
discussed herein). The Accord Chemistry SDK . Is disclosed In The Accord SDK is available firom 
Accelrys, ln& http:lhvww.accelry5.c0ml. MPX uses this fingerprint data to populate a lower triangular 
10 matrix with the Tanimoto similarity between pairs of compounds in the data s^ Tanimoto similarity 
between a pair of chemical structures (A» B) is defined by Equation 1 : 

CTAB = a+b-c(1) 

where a and b are the number of 1 bits appearing In the fingerprint of structures A and B, 
respectively and c is the number of 1 t>lts fan common to the fingerprints of structures A and B. The MPX 

IS software uses matrix of Tanimoto sImBarity to duster compounds, compute centroids of sutMdusters, 
and compute a group average similarity (see below) for the data set 

Clustering is achieved using a reciprocal nearest neighbors (RNN) algorithm, and consists of 
two primary steps: computation of the distance t>etween all Items In the data set, followed by an 
agglomeration process In which sulMduster hierarchies are fbrmed. A discussion of the reciprocal 

20 nearest neighbor algorithm (RNN) Is found in Murtagh, F. A Survey of Recent Advances In Hierarchical 
Clustering Algorithms. Computer Journal 1983, 26(4), 354-359. When clustering by chemical structure 
similarity, distances t>etween pairs of compounds are computed as 1-Tamimoto. When clustering by 
property profile, the distance between pairs of data may be computed In one of four ways: Canbenra, 
cosine, Euclidean, and l-Tanimoto. Five linkage algorithms are supported within MPX: single, 

25 complete, un-welghted arithmetic average, v^hted arithmetic average, arid Ward's, (as previously 
discussed herein). 

The MPX software displays three metrics in the title of the application window to aid 
interpretation of the corresponding hierarchical cluster. These are the number of compounds in the 
cluster, the group average similarity (GAS) between compounds, and the balance of the hierarchical 
30 duster. The group average simiiari^ is computed as the mean of the average Tanimoto similarity 

across rows of the similarity matrix, ignoring self-stmilarity. GAS ranges from 0 to 1. Data- sets 
consisting of diverse chemical structures produce a GAS of approximately 0.5, whiTe more focused data 
sets (e.g., combinatorial libraries) yield a GAS between 0.80 - 0.85. 

Balance Is a measure of the depth of a hierarchical cluster relative to the minimum depth of a 
35 binary tree consisting of equal capacity. Equation 2 defines balance of a hierarchical dusten 

cell(log, (N)) Balance =D 

where ceil is the celling function, N Is the number of compounds in the duster, and D is the actual depth 
of the hierardiicai duster. The numerator in Equation 2 above defines the minimum depth of a binary 
tree consistirig of N nodes. Consequently, a balance of 1.0 indicates a hierardiicai duster with 
40 minimum depth, whereas a balance dose to zero describes a hierarchical cluster with excessively long 
branches. 
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Each rectangular region of the heatmapttree-map represents a single compound in the data 
set aicklng and dragging with the left mouse knitton over a region of the heatmap or tree-map displays 
the conresponding 2D structures within a separate Structure Viewer window. Selected regions of the 
heatmapttree-map are outlined in t>lack. Clicking on a selected rectangle will outline it In red and 
5 highlight Its conesponding structure in the Structure Viewer with a white badcground. A user may 
export selected structure-activity data in SD or tatxielimited text format from the Structure Viewer 
dialog. 

The MPX software utilizes all of the infbmnation contained within a structure-property data set at 
once. iHowever, one may be interested in visualizing specific sut>sets independent of the larger data 

10 set The MPX software provides four means of partitioning a data set into smaller subsets for 
independent visualization. The data in such sut>sets are represented as heatmapsltreeHmaps in dialog 
windows separate finom the msdn application window. A data s^ may t>e partitioned on the basis of 
property, or 20 structure criteria. Partitioning on the baas of property criteria involves specifying 
discrete ranges for a set of propertiesw Only compouncfe whose property values Be within each 

1 5 specified range are included in the subset 

MPX offers three methods for partmoning a data set on the baste of 2D chemical structure 
criteria: substructure match, similarity match, and R-group analysis. The subsete generated from such 
partitions provide insight into the influence of specific structural features on SAR. The software " 
partitions a data set on the t>asis of sut)structurs criteria by identifying compounds in the date set that 

20 contein spedfted substructure(s). The user may define multiple substructure criteria and specify 
whett)er a matching compound must contein all substructures, or at least one substructure. A partition 
based on chemical similarity identifies those compounds of the date set that meet or exceed a minimum 
Tanlmoto similarify to a. set of query compounds. Lastfy, the MPX software can perform an "R-group" 
analysis of a set of compounds possessing a common core structura The user draws a core 

25 substructure with designated "Rgroups^, -and the MPX software generates liste of the unique 
substituente present at each attachnnent point for all compounds in the date set that possess the core 
substructure. These substituent liste are then used to define a query structure to be used in a 
sut>sequent substructure search of the date set Compounds that match the query structure are 
included In the partitioned siitiset 

30 Two examples illustrate the use of the MPX m^hod to visualize multidimensional structure- 

activify date sete. The first example employs the GtsO diversify set obtelned from the National Cancer 
Institute's Developmentel Therapeutics Program (see Screening date and 20 structures of compounds 
in ttie NCI GI50 diversify date set and available online: http:lldtp.nci.nih.govlwebdate.htrnl). The assay 
values in this date set are reported as the negative log of the concentration of compound required to 

35 Inhibit the growth of a tumor cell line by fiffy.peroent The second ^mple oonsiste of an estrogenic 
receptor binding date set obtained finom the National Center for To^dcological Research Estrogen 
Receptor (NCTRER) binding datebase, as discussed in Structures and estrogenic receptor binding date 
are availabie online: http:/lwww.epa.govlnheerl/dsstox/sdf-nctrer.html. 

The tree-map of Figure 2 represente a clustering by 2D chemical structure of 1883 compounds 

40 In the National Cancer Institute's diversify date set The group average similarify and t>alance of the 
hierarchical duster are shown in the titte bar of the MPX application window. The group avere^e 
simiterify for this set of strudures is 0.5280. and this value is consistent with a set of structuraify diverse 
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compounds. The balance of the hierarchical duster of the compounds is 0^37, and suggests the 
underlying hierarchical dieter possess branches of considerable depth. Indeed, this is apparent from 
the regions of densely packed rectangles in the upp^>lefl quadrant of the tree-map. Cor)catenation of 
the panel and celMines fields present in the GI50 data set formed the names that appear in the property 
S list to the left of the tree-map* The selected property corresponds to the OVCAR-3 cell line from the 
OVA panel. The data for the OVA-OVCAR-3 property was truncated to the range 4.0-8.0 within the 
MPX software. Compounds with an OVA-OVCAR-3 below 4.0 were assigned the value 4.0, and 
compounds v^h a above 8.0 were assigned the value 8.0. The legend to the right of the tree-map 
Indicates that 1689 of the 1883 compounds teve a GIsO t>etween 4 and 5.128 compounds have a 

10 between 5 and 6,36 compounds have a between 6 and 7, and 14 compounds have a between 7 and 8. 
in addition, there are 16 compounds with unreported Compound identification and property value are 
displayed in a tool-t^ as user passes the mouse over the tree-map. 

The data set consists of a varlely of diverse compound classes, dustered by chemical 
structure. Consequently, there are multiple regions of SAR scattered ttiroughout the tree-map of Rgure 

IS 2. For example, compounds belonging to the camptothedn and elilptidne dasses, both of which are 
known potent inhibitors of tumor growth, as discussed in Ohashi, M.; Old, T. Elilptidne and related 
anticancer agents. Expeti Opin. Ther. PaL 1996, 6, 1285-1294 and Shi, L M.; Myers, T. G.; Fart, Y.; 
Ot^onnor, P. M.; Paul!, IC D.; Fr1end» S. H.; Welnstein, J. N. Mining the National Cancer Institute 
Anticancer Drug Discovery Database: Cluster Analysis of ElBptidne Analogs with p53-lnverse and 

20 Central Nervous Sy^env^lective Pattems of Activity. Molepuiar Pharmacology 1998, 53, 241 -^ 251, 
are located within grid A-6 of the tree-map. In the event one was interested In visualizing all compounds 
active against the OVCAR-3, OVCAR-4, OVCAR-5, and OVCAR-8 ceil lines Independent of their 
chemical dass. Sudi visualizafion Is achieved In MPX by dustering the data on the basis of a property 
profile as niustratBd in Figure 3. The OVCAR-3, OVCAR-4, OVCAR-5, and OVCAR-8 cell lines were 

25 selected from the property list, and the data was re-dustered (employing Euclidean distance and 
complete linkage) using the tool bar's cluster button. Two distinct regions of densely padced rectangles 
charaderize the treemap of the reK:lustered data. These regions correspond to compounds that are 
either potent or impotent inhibitors of tumor cell growth across the seleded cell lines. Compounds with 
intermediate profiles separate these two regions. The most potent inhifc^tors of tumor cell growtti are 

30 located within grids A^2 and of the tree-map. 

A tree-map can represent only one property at a time, whereby a heatmap allows visualization ' 
of multiple properties simultaneously. A heatmap of the G150 data set dustered by profile across the - 
OVCAR-3, OVCAR-4, OVCAR-5, and OVCAR-8 ceil lines is shown in Figure 4. The potent inhibitors 
within grids A-2 and &-2 of the tree-map of Rgure 3 are represented in the upper region of the 

35 heatmap. Compound identification, property name and property value are displayed In a tool-tip as 
user passes the mouse over the heatmap. Eight compounds within this region have been selected and 
the structures of the first four of these are shown in the Structure Visualizer at the bottom of Rgure 4. 
The compounds within this region are structurally diverse as indicated by a centroid of 0.5450 
computed for the eight compounds selected. The compound Identified by NSC 176327 is the current 

40 selection in the Structure Visualizer, and its conresponding location In the heatmap is highlighted in red. 
When the user toggles the display bade to the tree-map, the corresponding tree-map rectangle for this 
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compound is likewise highlighted in red. The same color Is a common compound identifier In the tree- 
map and heatmap. 

Generally, prior art commercial software applications that provide support for heatmaps do so 
only In the context of two-way dustering. That Is. the ofteria used to duster acrc^ rows of the heatmap 
5 nmjst be used to duster across the columns. Heatmaps created within MPX allow two different sets of 
criteria to be used to duster across row and columns. The advantages gained from treating rows and 
columns of the heatmap as independent dusters are illustrafed in the analysis of the NCTRER data set 
that follows. 

The NCTRER data set consists of 230 compounds firom a variety of chemical classes. The data 

10 set Indudes the properties 'Activity Category ER-RBA" and 'ChemClass ERB". Activity Category ER- 
RBA classifies the estrogen receptor binding strength of each compound as one of the following: 
inactive, slight binder, active weak, active medium, and active strong. ChemCtass ERB assigns each 
compound to one of six broad chemkal classes: miscellaneous, biphenyls, dlethylstl&>estrol (DES), 
diphenlymethanes, phenols, phytoestrogens, and sterokls. Sub-types are used to further define 

IS compounds within these dasses. For example, phytoestrogen compounds fiall Into one of the fbltowing 
sul>dasses: flavones, Isoflavones. and mycoe^rogens. The MPX software Is compatible with tx>th 
continuous numeric and categorical text data. Hence, the text values associated with Activity Category 
ER-RBA and ChemClass ERB properties dkl not have to be numerically encoded prior to analysis. 

The compounds In the NCTRER data set were dustered by 2D chemlcai structure empkiying I- 

20 Tanlmoto distance and comply linkage. The heatmap of the dustered compounds and the dustered ' 
properties Activity Category ER-RBA and ChemClass ERB is shown in Figure 5. The group average 
similarity for these compounds is 0.5788, and ttie balance of the hiererchical duster of compounds Is 
0.3478. Compounds assigned to the "inactive" Activity Category are colored yelkiw, and "active strong" 
. compounds are cok>rBd Qght blue. Assignments within ChemClass ERB also are represented in the 

25 heatmap by rectangles filled with various shades of yellow or blue. Rebttonshlps between chemical 
structure and estrogen binding receptor activity are readily apparent from the heatfrap of Figure 5. The 
chemical dasses with the highest estrogen receptor binding affinity are kientified on the r^ht hand side 
of the heatmap. It Is important to note ttie correlatk>n t>etween the ordering of these compounds in the 
hierardilcal duster and their assigned ChemClass ERB. Also, the most active compounds (light blue) 

30 within each dass lie adjacent to one anottier In ttie heatmap. 

The NCTRER data set was pKartitk>ned Into a smaller subset on the basis of Activity Category 
being active medium or active strong. The compounds In the subset were dustered by 20 chemical, 
structure,. whk:h resulted In a general organizatk>n of the compounds by chemical dass as Is evident 
'from the tree-map of Figure 6. ChemClass ERB assignment is used to shade each rectangle of the 

35 . tree-map, and regions occupied by compounds belonging to the various chemical classes have been 
added to the figure. The large rectangle occupying ttie right tialf of the tree-map corresponds to the 
compound kei^one, an unusual estrogen receptor binder assigned to ttie chemteat dass 
"miscellaneous" and sbridurally dissimilar to every other compound in the subset 

The structures of four phytoestrogen sofiavones from grid A-l. and four steroids from grid O] of 

40 the tree-map In Figure 6 are shown in Figure 7A and 7B, respectively. The interpretation of ttie SAR 
within the two sets of structures is stra^htlbrward. Isoflavones become potent binders to estrogenb 
receptors when hydraxyl groups in the 7 and 4* posittons mimic 4, 4' OH positions in diethylstilbestroi. 
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as [llustFated by genlstein. Steroids possessing a-hydroxy 8uk>stitutBd, phenolic A-rings bind to 
estrogen receptors, and the strength of this binding is increased when an oxygen atom is present at the 
17-position. 

In order to create a tree-map or heatmap from structure-property data, one must first generate 
S a hierarchical cluster of the chemical structures. There are a numt>er of techniques for clustering 
chemical structures, and agglomerative hierarchical clustering using the reciprocal nearest neight>ors 
(RNN) algorithm such as disclosed in Murtagh, F. A Survey of Recent Advances in Hierarchical 
Clustering Algorithms. Computer Journal \983i?j5{4), 354-359. represents one approach. The RNN 
algorithm Is performed in two distinct steps. 

10 The first step involves computing a two-dimensional matrix of the distance between ail pairs of 

chemical structures in the data set There are a number of ways of computing the distance between two 
chemical structures, and one appn^ch- is to compute tlie distance as l-Tanimoto. Thd Taninioto 
ooefiicient computed from the binary representations for a pair of chemical structures obtdned from a 
computation of their chennical fingerprints, and the following equation (noted: this is the equation noted 

15 eariier and is not renumbered for that reason): where: a is the number of 1 bits appearing in the binary 
representation of structure A, b is the numb^ of 1 bits appearing in the binary representation of 
structure B, and c is the numk>er of 1 bits In common to the binary representations of both structures. A 
number of algorithms ^dst for computing chemical fingerprints, inciuding the Daylight a^orithm such as 
disclosed in . The ' guide to Daylight theory is available * online: 

20 http:/twww.dayiighLoomldayhtniUdocltheo^/theory.finger.htmK 

One second step of the RNN algorithm Involves linldng the individual . chemical structures 
together to form the duster hierarchy. Warcfs method, as disclosed in Ward, J. H. Hierarchical 
Grouping to Optimize an Objecth^e Function. J. Amer. Stat Assoc. 1963, 58, 236-244, is a common 
approach for the finking phase of the RNN algorithm. 

25 A hierarchical cluster of chemical structures is depicted as a tree-map by a recursive algorithm 

described as follows. The treenrnap algorithm traverses the l>ranches of a hierarchical duster 
recursively beginning with the root node. The algorithm begins with a rectangular region corresponding 
to the root of the hierarchical duster. As each branch Is visited, a rectangular region of the treemap is- 
spfit evenly along alternating vertical and horizontal centers. Upon reaching a terminal node in the 

30 hierarchical duster, the correspomJing rectangiilar region of the tree-map Is associated with the 
terminal node of the duster. The previously discussed pseudo-code illustrates one possible 
implementation of the tree-map algorithm. 

With spedfic reference to Figures 2-7B, there Is shown one embodiment of the Invention. The 
properties assodated with, the chemical strudures represented in the tree-map are shown In the list on 

35 the left side of Rgure 2. The tree-map occupies the central portion of the figure, and the colors used in 
the tneeHfnap to represent a chemical structure's value for a specific proper^ are defined in the legend 
to the right of the tree-map. The numt>ers to the right of ttie colored rectangles in the legend indicate the 
number of rectangular regions assigned ttiat color in the treeHnap. The tree-map consiste of regions of 
densely padced rectangles that are separated by more sparse regions. The dense regions represent 

40 sut>-dusters of structuraOy related compounds. The names of compounds, their structures and property 
valu^ are presented In a tool-tip as the mouse cursor passes over the various rectangular regions In 
tfie tree-map. 
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In one further prefenred embodiment, a selected rectangle or defined region is recursively 
partitioned to provide second level tree-map (not shown) for cognftive visualization and analysis. 

The data in Figure 2 were re-dustered t>ased on the compound's values across the four 
^cified properties (OVA-OVCAR-3, OVArOVCAR-4, OVAOVCAR- 6, and OVAOVCAR-S) and the 
resulting tree-map is shown in the drawings. 

The heatmap of the cluster of compounds and ttie cluster of selected properties (OVAOVCAR- 
3.OVArOVCAR-4.0VA-OVCAR-5, and OVAOVCAR- 8) is shown in the drawings. As the mouse cursor 
passes over the heatmap image* a tooMip displays the name of ttie compound, the property and the 
property value. The heatmap provides visuaBzation each compound's response across a range of 
selected properties. Each column of the heatmap corresponds to one of the four selected properties. 
The rows of the heatmap conrespond to the order in which compounds appear in the hierarchical duster 
of chemical structures. 
Schnefderman TreeMaps 

With specific reference to Figures 8-10, there are shown prior art treemaps as disdosed and 
shown in Baehrecke, E. H., Dang, N., Bat^aria, K., and Shneiderman, B.; "Visualization and analysis of 
microarray and gene ontology data with treemaps," BMC Bioinformatics, 2004, 5(1), 84-96 (the 
"Shneidemian tree-maps"). The Shneidennan tree-maps relate to the results of microanray gene 
exfxession experiments mapped onto the gene ontology and do not relate to specific chemical 
structure-property data. The Shneidenman tree-maps are based on a predefined hierarchy, namely, the 
gene ontology. That is, tfie gene ontology defines the division of the tree-map into specific rectangular 
regions, and then the gene expression data is mapped onto these fixed regions. The rectangular 
regions of the tree-map serve merely as kxHjndaries between elements of the gene ontology, and do 
not represent relationships between the elements of the gene ^ression data being depicted. The 
Shneidemnan treemaps necessarily are repl^ with explanatory verbiage related to the gene ontology, 
rendering them not readily visually cognitive. 

In marked contrast to tfie Shneiderman tree-maps, the layout or configuration of the tree-map 
of the present inventbn depends on the hierarchy that results from dustering the specific chemical 
structure-property data. That is, the division of a tree-map into rectangular regions is dynamic fit will be 
different for every data s^ and dustering criteria emptoyed), and the size and arrangement of these 
rectangular regions conveys specific infonmatk>n atxxit the relationship between chenucal structures in 
the duster hierarchy. An advantage of representing a hierarchical duster of chemical structures as a 
tree-map is that the tree-map allows one to simultaneously visualize all sub-dusters within the 
hierarchy. 
IV1PX Applications 

The MPX method and system provides a number of appllcatkms. The MPX method may be 
used quaritath^ely to predid the properties of new compounds. . A button on the tool bar provides 
mechanism for adding new chemical structures to a data set As new compounds are added, the data 
set is reK:lustered and the tree-map is redrawn to refled the placement of the new compounds within 
the duster. Assuming a sufRdent numt)er of compounds defining an SAR or SPR exists within the 
original data set, the activity of new compounds may be inferred from their nearest ne^hbors within the 
duster. One may assess the appropriateness of such comparisons by comparing the structures of 
nearest neighbors in a Structure Visualizer dialog, if a new compound appears centered v^in a dense 
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region of the treenmap wfth a defined SAR or SPR and the Structure Visualizer confirms the new 
cx>mpound Is structurally similar to its nearest neighbors, then the properties of those neighbors may be 
ascritied to the new compound. 

Further, the combination of hierarchical clustering based on 20 chemical structure and 
5 visualization as a tree-map provides a novel approach to representing the topology of the chemical 
space for a set of chemical structures. Properties other than those relating to biological activity may t>e 
mapped onto this topology. (See the definition and examples of "property" as set out hereint>eforo). For 
^can^le, the date on which a compound was synthesized and the name of the corresponding 
therapeutic program could be incorporated into the data set Such information would allow one to 
10 visualtee the discovery process within and across therapeutic projects. Shading the rectangles of the 
tree-map by date provides a hl^ric representation of the various medicinal chemistry strategies 
applied within a project A tree-map encoding the name of the therapeutic project for which a compound 
was synthesized might be used to Identify compounds applicable to other therapeutic programs. 

Another us^l property within the cont^nplation of the present invention is the visualization of 
IS chemical compound of related patent information (e.g. ass^nee. inventor(s), dalms relating the 
chemical structures under analysis). For example, one using the MPX method could color code the 
assignee, search the patent information and determine which commonly owned patents disclose andlor 
dalm certain chemical structures. 

With specific reference to Figure 11, there is shown an emt>odiment of this contemplation. The 
20 (teta presented in this figure was olytahed from a patent search of GABA^ a5 agonists. The tree-map 
of Figure 11 represents the clustering of the GABA-A a5 agonists based on the similarity of their 
chemical structures, and the rectangular regions of the tree-map are colored according to the assignee 
of the patented chemical matter. 

. The development of software capable of representing rnultidimensional structure-property data 
25 in a straightfiDrward and intuitive manner is a challenge achieved by the present invention. The 
simultaneous representation of clustered data as heatmaps and tree-maps Is a dynamic means of 
visualizing SAR or SPR. That is, these combined two powerful visualizations in combination with a set 
of data-mining tools into a software application provides a dynamic method and system for exploring 
and understanding multidimensional, structure-activity data sets. The MPX method may be used to 
30 Identify regions of struchirel simllarfty and dissimilarity within a data of compounds, segregate 
compounds Into distinct regions on the basis of a defined activity profile* and visualize relationships 
. -between structure and activity. 

The MPX system is preferably applied to moderately sized data sets of up to about 10,000 
compounds, and most preferably, atXMJt 5,000 to 8,000 compounds. Larger data sets may take 
35 significantiy longer to cluster, and might then not be well represented as heatmaps and troMnaps. 

The.MPX method and system may be applied to a broad range of object and related subject 
data* and is not limited to chemical structure (ofcject) and chemical properties (subjects) data. Any 
object-sutDfects data compatible with hierarchical clustering. Is within the contemplation of the present 
Invention. The subjects data Is related to the object data, and the object data is organteed into a 
40 hierarchical duster. 

While the foregoing description and examples used rectangles as the defined regions of tiie 
tree-rnap; it is within the contemplation of the present Invention to use other geometric regions (e.g. 
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triangles, hexagons^ and the Vke)M is further within the contemplation of the present invention to use 
oomtMnations of geometric regions for display (e.g., for example rectangles, triangles, and hexagons in 
combination) and optiorially varying the selection of the geometric shape depending upon a particular 
type of object-subject data or structure-. Is it also within the contemplation of the present Invention to 
5 employ a drfforent selection of geometric regions depending upon hierarchical position, iterative 
analysis step or other selecfion dedsion. 

It will also be recognized by those of appropriate skill In the art that the present invention is not 
limited to the display of cheirdcal related structures, but in effect, Is applicable to a wide array of data 
assemblies capable of any rational hierarchical structure. For example, medical data (e.g., heart and 
10 liver treatment data), therapeutic intervention data (treatnient of a specific disorder), genealogical data, 
commercial and investing data, market analysis and pen^ratun data> voting and polling data, and other 
forms of llnkat)le data. 

It is also contemplated that beyond an Initial tsroad data display, additional recursive portioning 
or focusing of a user-selected regton may occur to provMe a second, tertiary or greater level viewing 
IS with the tree-map and heat map. It is additionally contemplated that threeKJImenstonai viewing of the 
data Is available. 

This invention may be embodied In other specific forms wittiout departing from the essential 
characteristics as described herein. The embodiments described above are to be consklered in all 
respects as illustrative only and not restrictive In any manner. The scope of the Invention Is Indteated 
20 by the following dalms rather than by the foregoing description. Any and all changes which come 
within the meaning and range of equivalency of the claims are to be considered within ttieir scope. 
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CLAIMS 

WHAT IS CLAIMED IS: 

1. A method for the readDy cognrtive display of object data for a plurality of related subjects or 
structure property data of a plurality of compounds, comprising the st^ of: 

(a) displaying the data in a tree-map; 

(b) displaying the data in a heatmap; and 

(c) Integrating the tree-map and heatmap, wherein there is a readity cognftive display of the 

data. 

2. The method of daim 1 , wherein the data is hierarchically structurable. 

3. The method of daim 1, further comprising organizing the data in a duster hierarchy for 
display in the tree^nap and heatmap. 

4. The method of daim 1, wher^ the treeHfnap comprises a plurality of defined regions, and 
the heatmap comprises a plurality of cells comprising row and column intersections. 

5. The method of daim 4, wherein there is a 1 :l correspondence between a specific heatmap 
ceQ and a specific treeHfnap region. 

6. The method of daim 1, further comprising distinguishing an identifier for each subject, and 
wherein the distinguishing identifier for each subject is ttie same in the heatmap and tlie tree-map. 

- 7. The metfiod of daim 1, wherein the tree-map comprises a plurality of d^ned regions, and 
said method further comprises recurve nesting within a selected defined regbn to provide one of a 
second order defined region and a record .order derivative tree-map. • • 

8. The method of daim 1, wherein the plurality of related subject data comprise chemical ^ 
compound structures and the object data comprises properties the plurality of compounds. 

9. The method of daim 1, wherein the plurafity of related subjects comprise chemical 
structures and the object data are informational materials related to the chemical structures. 

10. The method of daim 9, wherein the infonmational materfeds comprise patent Information. 

11. The method of daim 1, wtierein the property comprises a physico-chemical proper^ 
related to the plurality of compounds. 

12. The mettiod of daim 1 1 , wherein the compounds have a related structure. 

13. The method of daim 1, further comprising a step of organizing the data, wherein the step 
of organizing comprises applying a rec4>rocal nearest ne^hbcM* algorithm to the data to generate a 
duster hierarchy. 

14. A method for the cognitive display of structure-properfy data for a plurality of 
chemical compounds comprising: 

(a) organizing the structure-property data by agglomerative hierarchicai dustering; and 

(b) displaying the structure-property data in a tree-map; wherein there Is a readily cognitive 
disfriay of the structure-proi^erty data. 

15. TTie m^od of daim 14, further comprising a step of creating a first structure-property data 
set and a second structure property data set, and respective tree-maps for the data sets, whereby said 
tree-maps comprise defined regions, and wherein the regions are diflerenfly sized for the respective 
data sets. 
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16. The m^od of claim 15, wherein the defined regions comprising geometric shapes 
enabling a ready two-dimensional display, said geometric shapes selected from the group consisting of 
rectangles, squares, triangles, hexagons, and combinations of the same. 

17. The method of claim 14, wherefn step (a) further comprises a step of applying a reciprocal 
5 nearest neighbor algorithm to the data to generate the hierarchical clustering. 

18. A method for the readily cognitive visual display of structure-property data for a plurality of 
chemical compounds, comprising: 

(a) organizing the structure-property data by agglomerative hierarchical clustering: 

(b) di^laying the structure-property data in a tree-map; 

10 (c) displaying the structure-property data in a heatmap; and (d) integrating the tree-map and 

heatmap, wh^by there is a readily cognitive multidimensional display of the structure-property data 
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FIG. 6 

A tree-map of the most active estrogen receptor binding compomids in 
the NCTRER data set The compounds are clustered according to the 
similarily between Iheir 2D chemical fingerprints. The con:q)omids tend 
to cluster according to their assigned chemical class as illustrated by the 
annotations that have been added to the tree-m^. 
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